Transelliptical Component Analysis
نویسندگان
چکیده
We propose a high dimensional semiparametric scale-invariant principle component analysis, named TCA, by utilize the natural connection between the elliptical distribution family and the principal component analysis. Elliptical distribution family includes many well-known multivariate distributions like multivariate Gaussian, t and logistic and it is extended to the meta-elliptical by Fang et.al (2002) using the copula techniques. In this paper we extend the meta-elliptical distribution family to a even larger family, called transelliptical. We prove that TCA can obtain a near-optimal s √ log d/n estimation consistency rate in recovering the leading eigenvector of the latent generalized correlation matrix under the transelliptical distribution family, even if the distributions are very heavy-tailed, have infinite second moments, do not have densities and possess arbitrarily continuous marginal distributions. A feature selection result with explicit rate is also provided. TCA is further implemented in both numerical simulations and largescale stock data to illustrate its empirical usefulness. Both theories and experiments confirm that TCA can achieve model flexibility, estimation accuracy and robustness at almost no cost.
منابع مشابه
Transelliptical Graphical Models
We advocate the use of a new distribution family—the transelliptical—for robust inference of high dimensional graphical models. The transelliptical family is an extension of the nonparanormal family proposed by Liu et al. (2009). Just as the nonparanormal extends the normal by transforming the variables using univariate functions, the transelliptical extends the elliptical family in the same wa...
متن کاملCommunication-efficient Distributed Estimation and Inference for Transelliptical Graphical Models∗
We propose communication-efficient distributed estimation and inference methods for the transelliptical graphical model, a semiparametric extension of the elliptical distribution in the high dimensional regime. In detail, the proposed method distributes the d-dimensional data of size N generated from a transelliptical graphical model into m worker machines, and estimates the latent precision ma...
متن کاملSemiparametric Principal Component Analysis
We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of ...
متن کاملFantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspac...
متن کاملPAGE: Robust Pattern Guided Estimation of Large Covariance Matrix
We study the problem of estimating large covariance matrices under two types of structural assumptions: (i) The covariance matrix is the summation of a low rank matrix and a sparse matrix, and we have some prior information on the sparsity pattern of the sparse matrix; (ii) The data follow a transelliptical distribution. The former structure regulates the parameter space and has its roots in di...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012